跳到主要内容

ASC集群设计

Cluster diagram

The architecture of the entire cluster is depicted in Figure 1.

ClusterDesign2GPU

Hardware Resources

ItemNameConfigurationNum
Login in NodeInspur NF5688M5CPU: Intel Xeon Gold 5318Y_24cores_2.1G *2
Memory: 64G * 16,DDR4,3200Mhz
Hard disk: 1.92T SSD SATA * 2
Power consumption estimation: 5318Y TDP 165W, memory 70W, disk 50W
1
CPU Compute NodeInspur NF5688M6CPU: Intel Xeon Gold 6348 Processor 2.6GHz 28 cores * 2
Memory: 64G * 16,DDR4,3200Mhz
Hard disk: 1.92T SSD SATA * 2
Power consumption estimation: 6348 TDP 235W, memory 42W, disk 300W
2
GPU Compute NodeInspur NF5688M6CPU: Intel Xeon Gold 6348 Processor 2.6GHz 28 cores * 2
Memory: 64G * 16,DDR4,3200Mhz
Hard disk: 1.92T SSD SATA * 2
Power consumption estimation: 6348 TDP 235W, memory 42W, disk 300W
NVIDIA A100 SXM4 NVLink 600 GB/s
MEMORY:80GB HBM2
BANDWIDTH:1,555 GB/s
Max TDP Power:400W
1
HCA cardInfiniBand/VPI CardConnectX-5 VPI adapter card, FDR/EDR IB (100Gb/s) and 40/50/100GbE dual-port QSFP28
PCIe4.0 x16, tall bracket
Power consumption estimation: 18W
3
SwitchGbE switch10/100/1000Mb/s, 24 ports Ethernet switch
Power consumption estimation: 100W
1
EDR InfiniBand SwitchSB7800 InfiniBand EDR 100Gb/s Switch System
36 QSFP28 non-blocking ports
300W typical power consumption
1
CableGigabit CAT6 cablesCAT6 copper cable, blue, 3m3
InfiniBand cableMellanox MCP1600 -E0xxEyy direct attach copper (DAC) cables, 100Gb/s QSFP28 port
IB EDR, 3m, Black, 26AWG
3

Software Resources

ItemNameVersion
Operating systemCentOS7.9
Translatermpicx, mpicpx2024.0.2
icx2024.0.2
Math libraryIntel MKL2024.0.2
MPIOpenMPI4.0.5
Intel-mpi2024.0.2
GPU-accelerated applicationCUDA toolkit12.4

System analysis

The estimated computing performance of the cluster is as follows (FP32):

  • CPU: 2.6 GHz (6348 Processor Base Frequency) * 28 (28 cores) * 4 (4 Processor) * 1 * 2 * 512 (# of AVX-512 FMA Units, 1, with up to 2 FMAs) / 32bit * 1 (1 compute node) = 13977.6 GFLOPS = 13.9776 TFLOPs
  • GPU: 19.5 TF (Peak FP32) * 2 = 39 TFLOPs
  • Total: 52.9776 TFLOPs

The power consumption estimation for resource utilization is presented in Table 3.

NamePower consumption
Login in Node450W * 1
CPU Compute Node750W * 2
GPU Compute Node1600W * 1
GbE Switch100w
InfiniBand Switch250w
InfiniBand/VPI Card18w * 3
Total3954w

The HPC cluster design offers impressive computational power (52.9776 TFLOPs) and optimized communication through InfiniBand and Ethernet, making it ideal for AI and research tasks. However, its high power consumption and limited storage, combined with a single login node and only one GPU node, could hinder scalability and reliability.